SparkSession.createDataFrame
SparkSession.createDataFrame(
data: Iterable[pyspark_dubber.sql.row.Row | dict[str, Any] | Any] | pandas.core.frame.DataFrame | numpy.ndarray,
schema: pyspark_dubber.sql.types.StructType | pyspark_dubber.sql.types.AtomicType | str | Sequence[str] | None = None,
samplingRatio: float | None = None,
verifySchema: bool = True,
)
Incompatibility Note
Generally createDataFrame is a complex method, so certain edge cases are not handled correctly. Some notable incompatibilities with pyspark:
- numpy arrays are not yet accepted as input data type.
samplingRatiois not honored.